2,982 research outputs found
Does BLEU Score Work for Code Migration?
Statistical machine translation (SMT) is a fast-growing sub-field of
computational linguistics. Until now, the most popular automatic metric to
measure the quality of SMT is BiLingual Evaluation Understudy (BLEU) score.
Lately, SMT along with the BLEU metric has been applied to a Software
Engineering task named code migration. (In)Validating the use of BLEU score
could advance the research and development of SMT-based code migration tools.
Unfortunately, there is no study to approve or disapprove the use of BLEU score
for source code. In this paper, we conducted an empirical study on BLEU score
to (in)validate its suitability for the code migration task due to its
inability to reflect the semantics of source code. In our work, we use human
judgment as the ground truth to measure the semantic correctness of the
migrated code. Our empirical study demonstrates that BLEU does not reflect
translation quality due to its weak correlation with the semantic correctness
of translated code. We provided counter-examples to show that BLEU is
ineffective in comparing the translation quality between SMT-based models. Due
to BLEU's ineffectiveness for code migration task, we propose an alternative
metric RUBY, which considers lexical, syntactical, and semantic representations
of source code. We verified that RUBY achieves a higher correlation coefficient
with the semantic correctness of migrated code, 0.775 in comparison with 0.583
of BLEU score. We also confirmed the effectiveness of RUBY in reflecting the
changes in translation quality of SMT-based translation models. With its
advantages, RUBY can be used to evaluate SMT-based code migration models.Comment: 12 pages, 5 figures, ICPC '19 Proceedings of the 27th International
Conference on Program Comprehensio
Deep Learning for Plant Identification and Disease Classification from Leaf Images: Multi-prediction Approaches
Deep learning plays an important role in modern agriculture, especially in
plant pathology using leaf images where convolutional neural networks (CNN) are
attracting a lot of attention. While numerous reviews have explored the
applications of deep learning within this research domain, there remains a
notable absence of an empirical study to offer insightful comparisons due to
the employment of varied datasets in the evaluation. Furthermore, a majority of
these approaches tend to address the problem as a singular prediction task,
overlooking the multifaceted nature of predicting various aspects of plant
species and disease types. Lastly, there is an evident need for a more profound
consideration of the semantic relationships that underlie plant species and
disease types. In this paper, we start our study by surveying current deep
learning approaches for plant identification and disease classification. We
categorise the approaches into multi-model, multi-label, multi-output, and
multi-task, in which different backbone CNNs can be employed. Furthermore,
based on the survey of existing approaches in plant pathology and the study of
available approaches in machine learning, we propose a new model named
Generalised Stacking Multi-output CNN (GSMo-CNN). To investigate the
effectiveness of different backbone CNNs and learning approaches, we conduct an
intensive experiment on three benchmark datasets Plant Village, Plant Leaves,
and PlantDoc. The experimental results demonstrate that InceptionV3 can be a
good choice for a backbone CNN as its performance is better than AlexNet,
VGG16, ResNet101, EfficientNet, MobileNet, and a custom CNN developed by us.
Interestingly, empirical results support the hypothesis that using a single
model can be comparable or better than using two models. Finally, we show that
the proposed GSMo-CNN achieves state-of-the-art performance on three benchmark
datasets.Comment: Jianping and Son are joint first authors (equal contribution
- …